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ULTRACOMPUTER  IMPLEMENTATION  OF  ODD-EVEN  REDUCTION 


The  Odd-Even  Reduction  Algorithm  has  been  widely  used  to 
solve  tridiagonal  systems  of  equations  in  serial  computers, 
especially   in  the  context  of  fast  Pois son-solvers  [  1]. 
The  parallel  form  of  the  algorithm  is  also  discussed  in 
Heller  [  2].   The  purpose  of  this  note  is  to  show  that  the 
structure  of  the  algorithm  is  just  right  for  implementa- 
tion on  an   ul t racomput er . 

The  system  of  equations  to  be  solved  is 

(1)  a.x.  ,+  b.x.  +  c.x.,T  =  y.  ,   i  =  1,2,.. .,N 
where 

(2)  a^  =  c^  =  0 

(3)  b  .  >  |a.  I  +  I c.  I 

To  derive  the  algorithm,  raise  and  lower   j   by   1   in  (1) , 

solve  the  resulting  equations  for   x. , ,   and   x.  ,   ,  and 

J+1         J-1 

substitute  these  results  in  (1).   The  result  is  a  system  of 
the  form 

(4)  a'x._2  ^  b'x.  +  cU.^2  =  y-'   J  =  1,2,..  .,N 

where 

a  .  a  .  . 

(5)  a-  =  -  I    -1-^ 

C  3. 

(6)  b!  =  b^  -  a.r-^^    -  c.  r^^ 


-  1  - 


(7) 


(8) 


c! 


c  . 


i+1 


J  +  1 


^-1 


These  formulae  hold  for   j  =  1,2,.  ..,N   provided  that  they 
are  interpreted  properly  at  the  end  points.   For  example, 
when  j  =  1  ,  equations  (5)  and  (6)  involve   a  ,  b   ,  and 
c    which  are  undefined.   These   quantities  are  multiplied 
by   a   =  0  ,  however,  so  we  use  the  convention   0*  unde- 
fined =  0.   With  this  understanding,  we  see  that 


(9) 


(10) 


•     —     o  ' 


^1  =  °   ■=)   ^1  =  ^2 


'^N  = 


0     ^    'Li-   S  =  ° 


It  can  also  be  shown  that 


(11)      b :  >  la!   +  c: 
J      J       1 


j  =  1,2,. . . ,N 


This  is  a  special  case  of  the  general  result  that  diagonal 
dominance  is  preserved  under  Gaussian  elimination. 

Thus  the  system  (4)  consists  of  two  uncoupled  systems,  one 
for   j   even  and  the  other  for   j   odd.   Each  of  these  sys- 
tems has  the  same  form  as  the  system  (1) ,  except  for  the 
numbering  of  the  unknowns.   In  a  serial  computer,  the  best 
strategy  is  to  work  with  one  of  these  systems  and  proceed 
recursively  until  only  one  unknown  remains.   The  amount  of 
work  is  then  proportional  to   N  +  N/2  +  N/4  +  . . .    <  2N  . 
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Once  this  unknown  has  been  found,  the  remaining  unknowns 
can  be  found  by  back-substitution  in  a  similar  number  of 
operations  . 

In  an  ultracomput er ,  however,  it  is  better  to  work  with 
both  systems  at  once,  since  the  two  sets  of  unknowns  can 
be  manipulated  in  parallel  at  no  extra  cost.   A  single 
unshuffle  operation  brings  the  odd  unknowns  into  one  half 
of  the  machine  and  the  even  unknowns  into  the  other. 
This  brings  the  system  (4)  back  to  tridiagonal  form  but 
now  there  are  zeros  in  the  appropriate  positions  to  un- 
couple the  unknowns  (1...N/2)  from  the  unknowns 
( (N/ 2) +1 . . . N) .   If  this  basic  operation  (reduction  followed 
by  unshuffling)  is  repeated  log„N  times,  then  all  of  the 
unknowns  are  uncoupled  from  each  other,  and  the  solution 
can  be  completed  by  solving  a  single  scalar  equation  in 
each  processor.   Incidentally,  the  components  of  the  solu- 
tion appear  in  their  proper  order,  since  exactly  log  N 
unshuffle  operations  have  been  performed.   Back  substitu- 
tion is,  of  course,  unnecessary,  since  all  of  the  unknowns 
have  been  determined  at  once.   The  machine  cycle  count  for 
the  entire  algorithm  is  proportional  to  log  N  . 

A  diagram  of  the  algorithm  follows  for  the  case   N  =  8  . 
The  vertical  lines  indicate  the  coupling  between  the  un- 
knowns at  each  stage.   The  slanting  lines  represent  the 
unshuffle  operation. 
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We  conclude  this  discussion  with  certain  remarks  which 
greatly  extend  the  usefullness  of  the  algorithm  outlined 
above.   In  many  applications  it  will  be  necessary  to 
solve  many  tridiagonal  systems  simultaneously,  instead 
of  just  one.   For  example,  in  a  physical  problem  in  three 

dimensions  with  an   N  xN  xN   mesh,  it  may  be  necessary  to 

2 
solve   N    tridiagonal  systems  of  order   N  ,  one  corre- 
sponding to  each  row  of  the  mesh  parallel  to  one  of  the 
coordinate  axes.   Fortunately,  several  tridiagonal  sys- 
tems can  be  thought  of  as  a  single  system  with  zeros  at 
the  appropriate  positions  to  provide  uncoupling.   Similar- 
ly, if  we  want  to  solve  a  single  tridiagonal  system  which 
does  not  fill  the  ul tracomputer ,  we  can  extend  it  by  add- 
ing dummy  equations  such  as   x.  =  0   so  that  the  extended 
tridiagonal  system  is  the  right  size. 
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Finally  it  often  happens  that  systems  which  are  almost 
tridiagonal  arise  in  practice.   For  example,  the  use  of 
periodic  boundary  conditions  leads  to  systems  of  equations 
which  are  tridiagonal  except  for  nonzero  matrix  elements 


^IN  =  ^1   ^^'^   Si  =  '^N 


(12) 


A  =  A   +  uv 
o 


These  can  be  solved  by  writing 
T 


where   A    is  tridiagonal  and   u,v   are  column  vectors. 
o 

Then  systems  of  the  form   Ax  =  y   can  be  solved  at  the 
expense  of  solving  two  systems  of  the  form   Ax   =  y   , 
a   =  1,2  .   Although  the  algorithm  is  a  little  more  compli- 
cated in  this  case,  the  machine  cycle  count  is  still  pro- 
portional to  log   N  . 
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